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REMARKS 

The title of the specification has been amended as suggested on page 3 of the 
Office Action and the status of this application has been changed to CIP as 
suggested on Page 6 of the Office Action. 

Numbers related to cysteine proteases have been removed from the 
specification. 

A substitute declaration citing the understanding of the preliminary 
amendment, signed by all the inventors, the preliminary amendment filed on the day 
of filing , is submitted herewith. Therefore, it is respectfully submitted that the 
preliminary amendment, filed February 15, 2002, should now be considered part of 
the original disclosure and, thus, obviate any new matter rejections. 

The claims have been amended to delete the phrase "one of 1 . 

The term "recombinant construct" has been replaced with "chimeric gene". 
Support for this substitution can be found in the specification on page 11, lines 1-5, 
wherein the term "chimeric gene" is defined as "any gene that is not a native gene, 
comprising regulatory and coding sequences that are derived from different sources, 
or regulatory sequences and coding sequences, that are derived from the same 
source, but arranged in a manner different than that found in nature." Thus, no new 
matter has been added. 

Claims 31-34 and 37-43 were rejected under 35 USC §112, first paragraph, as 
being enabling only for polynucleotides encoding SEQ ID NO:24. Applicants 
respectfully submit that the specification discloses to one of ordinary skill in the art a 
representative number of cysteine proteases with at least 80% sequence identity to 
SEQ ID NO:24, and not just a single polynucleotide encoding SEQ ID NO:24. 

Attention is kindly invited to the specification at page 7, line 4 to 14, which 
discloses that alterations in a nucleotide sequence that are not expected to alter 
functionality, e.g., alterations that produce a chemically equivalent amino acid at a 
given site or alterations in the N- or C-terminal portions. The foregoing discussion 
shows that the skilled artisan would readily understand that the specification 
discloses a representative number of polynucleotide sequences, having different 
nucleotide substitutions, that encode cysteine proteases and that vary (within 80% 
sequence identity) of SEQ ID NO:24. 

Furthermore, submitted herewith is a copy of Karrer et al. (P.N.A.S. 90: 3063- 
3067, 1993). Karrer et al. disclose conserved sequence motifs (page 3065, Fig. 2) 
shared by one subfamily of cysteine proteases. Attached hereto as Appendix A is a 
comparison of SEQ ID NO:24 of the instant invention with a Phaseolus vulgaris 
cysteine protease (NCBI Gl No. 251 1691 ). This comparison demonstrates the SEQ 
ID NO:24 possesses the same conserved sequence motifs associated with a 
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subfamily of cysteine proteases and would, thus, be expected to have the 
functionality of a cysteine protease. 

Active site residues and cysteines in disulfide bridges are labeled. Residues 
conserved among all known cysteine proteases are underlined. The soybean 
cysteine protease possesses all of the described sequence motifs. The Amino acids 
conserved among the two sequences are indicated with an asterisk (*) on the top 
row; dashes are used by the program to maximize alignment of the sequences. 

It is respectfully submitted that one skilled in the art could easily determine, 
without engaging in undue experimentation, which amino acid residues could be 
modified in SEQ ID NO: 24 without changing its function. Since SEQ ID NOs:24 
and the Phaseolus vulgaris sequence share only 75% identity, one of ordinary skill in 
the art would appreciate that many variants, having cysteine protease activity and 
having at least 80% sequence identity with SEQ ID NO:24, are possible. 

Claims 31-43 were rejected under 35 USC §112, first paragraph, new matter, 
as failing to comply with the written description requirement. It is believed that this 
ground of rejection has been obviated by submission of the substitute declaration. 

In view of the above discussion, it is respectfully submitted that the claims are 
now in form for allowance which allowance is respectfully requested. 

A petition for a one (1 ) month extension of time accompanies this response 
along with a substitute declaration and a copy of the Karrer et al. reference. 

Please charge any fees or credit any overpayment of fees which are required 
in connection herewith to Deposit Account No. 04-1928 (E. I. du Pont de Nemours 
and Company). 



Respectfully submitted, 
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ABSTRACT A cDNA clone for a physiologically regulated 
Tetrahymena cysteine protease gene was sequenced. The nu- 
cleotide sequence predicts that the clone encodes a 336 -amino 
acid protein composed of a 19-residue N-terminal signal se- 
quence followed by a 107-residue propeptide and a 210-residue 
mature protein. Comparison of the deduced amino acid se- 
quence of the protein with those of other cysteine proteases 
revealed a highly conserved interspersed amino acid motif in 
the propeptide region of the protein, the ERFNIN motif. The 
motif was present in all of the cysteine proteases in the data base 
with the exception of the cathepsin B-like proteins, which have 
shorter propeptides. Differences in the propeptides and in 
conserved amino acids of the mature proteins suggest that the 
ERFNIN proteases and the cathepsin B-like proteases consti- 
tute two distinct subfamilies within the cysteine proteases. 



The cysteine proteases are a family of enzymes that play an 
important role in intracellular protein degradation. These 
proteases and their cDNA clones have been isolated from 
phylogenetically diverse organisms ranging from slime mold 
to mammals. The tertiary structures of two plant cysteine 
proteases, papain and actinidin, have been solved (1, 2). The 
enzymes have two protein domains that come together to 
form the active site. Amino acid sequence homologies sug- 
gest this double domain structure is conserved in the animal 
thiol proteases cathepsins B, H, and L (3). 

The phylogenetic range of organisms for which the se- 
quence of cysteine protease genes are known was extended 
by determination of the sequence of a cDN A clone for a gene 
from a ciliated protozoan, Tetrahymena thermophilaA Com- 
parison of the deduced amino acid sequence to those of 
known cysteine proteases revealed the presence of an amino 
acid motif in the propeptide region consisting of highly 
conserved amino acids interspersed with variable residues. 
The motif was present in 15 of 20 cysteine proteases in the 
EMBL/GenBank data base (August 1992). The five prote- 
ases that lacked the motif were all cathepsin B-like enzymes. 
Recognition of the differences in the propeptide region 
prompted comparison of the mature proteins. Alignment of 
the amino acid sequences of the proteases as two separate 
groups allowed identification of amino acids that are highly 
conserved among the proteases with the propeptide motif or 
among the cathepsin B-like proteases but strikingly different 
between the two groups. We suggest that the proteins with 
the interspersed motif and the cathepsin B-like proteases 
represent two distinct classes of cysteine proteases that can 
be distinguished by both propeptide and mature protein 
structure. 

MATERIALS AND METHODS 

Tetrahymena clone pCyP (formerly BC11) is a cDNA clone 
of an RNA that is expressed in starved, but not growing, cells 

The publication costs of this article were defrayed in part by page charge 
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(4, 5). The clone was isolated from a cDNA library of RNA 
from starved cells cloned into the Pst I site of pUC9 (4). DNA 
fragments were subcloned into pBluescript for sequencing. 
The sequence was scanned for open reading frames by using 
the dna inspector he program (Textco), taking into con- 
sideration that in Tetrahymena, as in several ciliates, TAA 
and TAG code for Gin (6-8). DNA sequences that code for 
homologous proteins were identified through a Pearson and 
Lipman (9) search of the EMBL/GenBank data base by using 
the tfasta program. 

RESULTS 

The 1189-bp nucleotide sequence of Tetrahymena pCyp and 
the derived amino acid sequence of the protein encoded by 
the single long open reading frame beginning at the first AUG 
are presented in Fig. 1. The open reading frame encodes a 
protein of 336 amino acids with a calculated molecular weight 
of 37,716. A short poly(A) sequence at the 3' end of the insert 
is 33 bp downstream from a consensus poly(A) addition site, 
AAUAAA, and is presumably derived from the poly(A) tail 
of the RNA. 

The amino acid sequence of pCyP is highly homologous to 
those of cysteine proteases from a variety of eukaryotes. The 
deduced sequence of the Tetrahymena protein is shown in 
Fig. 2 along with sequences of representative cysteine pro- 
teases from a slime mold, a plant, an arthropod, and a 
mammal. The sequences have been aligned to maintain 
blocks of homology and to align cysteines that form disulfide 
bridges in papain (3). The numbers above the Tetrahymena 
sequence have been assigned with positive numbers for the 
putative mature protein and negative numbers for the prepro- 
region of the protein. Blocks of highly conserved sequence 
contain the Gin 19 , Cys 25 , His 162 , Asn 179 , and Tip 181 residues 
that are present at the active site of cysteine proteases (1). 
Conservation of amino acid sequence predicts that the ma- 
ture Tetrahymena protein has a molecular weight of 22,850 
and an N-terminal Leu. 

The structure of the Tetrahymena cysteine protease gene 
suggests that it is translated as a preproenzyme. The open 
reading frame has a preponderance of hydrophobic residues 
in the first 19 amino acids of the protein, as expected for the 
signal peptide commonly found at the N terminus of cysteine 
proteases. Calculations according to the weight-matrix 
method of von Heijne (15) predict that the signal sequence 
cleavage site is after the first Ala residue (Fig. 1). 

The putative propeptide region contains a block of con- 
served amino acids from Leu -51 to Phe -40 , which has pre- 
viously been noted as a feature of several cysteine proteases 
(16). When the propeptide regions were aligned with minimal 
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MNKKFIILSIIMLM 
1 AAAATAAAAAAAAATAAAAAAACTAAAACTTTAAAGTATGAATAAAAAATTCATCATTTTGAGTATTATCATGCTCATG 

PLCLAQDISVEKLLAYNKWSSQNQRA 
80 CCTCTCTGTTTGGCTCAAGATATAAGTGTAGAAAAACTTCTTGCTTATAATAAATGGTCAAGCTAAAATCAAAGAGCC 

YLNEDEKLYRQIVFFENLQKI KEHNS 
158 TATCTGAATGAAGATGAAAAACTGTATAGACAAATAGTTTTCTTTGAAAATTTGTAAAAAATTAAGGAGCATAACAGT 

NPNNTYS IHLNQFSDMTREEFAEKIL 
236 AACCCTAATAACACCTATTCTATCCATTTAAACTAATTCTCAGATATGACTAGAGAAGAATTTGCAGAAAAAATTCTT 

MKQDLINDYMKGIGQQATHNNANNET 
314 ATGAAATAGGATTTGATTAACGATTATATGAAGGGAATTGGTTAATAGGCTACTCACAATAATGCTAATAACGAAACT 

f 

QMNSQNHTLAASIDWRTKGAVTSVKD 
392 TAAATGAATTCATAAAACCATACTTTAGCTGCTTCTATAGATTGGAGAACAAAAGGTGCTGTAACATCGGTTAAGGAT 

QGQCGSCWSFSAAALMESFNFIQNKA 
470 TAAGGTTAATGTGGTTCATGCTGGAGTTTCTCTGCAGCTGCCTTAATGGAGTCATTTAACTTCATTTAAAACAAAGCT 

LVNFSEQQLVDCVTPENGYPSYGCKG 
548 TTAGTTAATTTTTCTGAGTAATAACTTGTTGATTGTGTGACCCCTGAAAATGGTTACCCCTCTTATGGATGTAAAGGA 

GWPATCLDYASKVGITT L D K Y P Y V A V 
626 GGATGGCCTGCTACTTGTCTGGATTATGCCTCCAAAGTAGGTATCACAACACTAGACAAGTATCCCTATGTTGCAGTA 

QKNCTVTGTNNGFKLKKWIVIPNTSN 
704 CAGAAAAATTGTACTGTGACAGGTACAAATAATGGCTTTAAGCTTAAAAAGTGGATTGTAATTCCTAACACTTCAAAC 



782 



DLKSALNFSPV 
GACTTAAAAAGTGCTTTA 



SVLVDATNWDYYSSG 



IFNGCNQTNINLNHAVLAVGYDEKDN 
860 ATTTTCAACGGATGTAATTAAACTAATATTAATCTTAATCATGCTGTATTAGCTGTAGGCTATGACGAAAAAGATAAC 

WIVKNSWSAGWGEHGYIRLAPNNTCG 
938 TGGATTGTTAAAAATTCTTGGAGCGCTGCTTGGGGTGAACATGGATATATTAGACTTGCTCCTAACAATACATGTGGT 

ilssniqvta* 
1016 atcttaagctctaatatataagttactgcttgaWttaggataagctattataattaaaatttattaaaatatatta 



1094 



TTCGTATAAACAAAAATATTATATAAAATAAATAGTCTTTCAAATATTAAGATTTGTTTAATTTTA" 



31 



. Fig. 1. DNA sequence and deduced amino acid sequence of pCyP. Arrows, putative sites of posttranslational cleavage; stop codon; 
underlined, poly(A) addition site. The sequence contains 13 TAA and 2 TAG Gin codons. 



gaps, a consensus sequence was found between Glu" 81 and 
Asn -62 that consists of conserved amino acids interspersed 
with variable ones: EX 3 RX2(V/I)FX2NX 3 IX3N. This "ER- 
FNIN" motif, named for the single letter code of the con- 
served amino acids, was found in 15 cysteine protease genes 
identified in a combined search of the literature and the 
GenBank data base (Table 1). * 

All of the cysteine proteases with similarity to the mam- 
malian H and L cathepsins contain the ERFNIN motif and 
amino acid variants within the motif generally display a high 
degree of structural similarity to the consensus residue. 
Discounting the Trypanosoma protease, the first two amino 
acids of the motif, Glu and Arg, and the final Asn are perfectly 
conserved among the 14 enzymes. The Phe is present in 11 of 
the proteins and in the other 3 Phe is replaced by another 
amino acid with an aromatic ring, Trp. There are two 
variations in the first Asn of the motif and the three examples 
in which the consensus lie is replaced by Val. These are all 
highly conservative changes as measured by the scale of Feng 
et al. (14). 

The most unusual cysteine protease that bears an identi- 
fiable ERFNIN motif is the Trypanosoma enzyme (Table 1). 
In this protein, three residues in the motif are replaced by 
Ala, which does not bear strong structural resemblance to the 



consensus amino acid. The significance of this degree of 
variation is unknown; however, it is noted that the Try- 
panosoma cysteine protease is also unique in another re- 
spect. It has a long 108-residue extension at the C-terminal 
end in the deduced protein. Estimates of the molecular mass 
suggest that at least some of the C-terminal extension persists 
in the mature enzyme (17). 

The interspersion distance of the conserved amino acids in 
the ERFNIN motif suggests that they lie along one face of an 
a-helix. In addition, the interspersed motif is found at a 
discrete distance from the block of conserved amino acids in 
the propeptide. In 7 of the 15 cysteine proteases, 14 amino 
acids are present between the last Asn of the ERFNIN motif 
and the first Asn of the conserved block. In the other 8 
proteases, there are 10 or 11 amino acids, a number consis- 
tent with one fewer turn of an a-helix. This suggests that the 
interspersed ERFNIN motif plus the conserved block pre- 
viously noted by Ishidoh et al. (16\ .constitute a functional 
unit. 

The five cysteine proteases in the data base that did not 
contain the ERFNIN motif were the cathepsin B-like prote- 
ases. The propeptides of the cathepsin B-like proteases are 
shorter than those of the proteases that contain the ERFNIN 
motif. It is difficult to predict from conservation of sequence 
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TeoshymcnicyitdDepioteueCro mnkkfiils 

Kctyostelimncy«dnep»t^l(Dd) mkvillt 

Pipayi papain (Cp), lOMiPSiSKLLPVAicLrvYMCLsrGDr 

Lobaocyjtdneproteaje2(Ha) 

Rat cajhqwin L (Rn) HTPLIAL 

-110 -90 -70 

Tt Z IMLMP I^LAQDI SVEKLLAYNKW3SQNQRAYLNEDE KL YRQ I VF FEWLQKI KEHNSUP N 

Dd VUVVFYVFVS8R0IPPEEQSQFLEFQDKFNLLYS HEEYLERFE IFKSNLGKIEELNLIAI 

Cp SIVGLYSQNDLTSTERLIQLTESKMLKHNK I YKN IDEK X YRFE I FKDWLKYIDETNKKNN 

Ha . HKVAVLFLCGVALAAASPSWEHFKGKYGRQYVOAEEDSYRRVI PEQHQXYIEEFKKKYB 

Rn AVLCIXSTAIATPKFDQTFNAQWHQWKSTHRRLYGTNEEEWRRAVWEWOII^IQLHNGEYS 



-50 



-30 



Cp 



Tt 
Dd 
CP 



Tt 
Od 
Cp 
Ha 
Rn 



Tt 
Dd 

Cp 



Tt 
Dd 
CP 



NTYS . . . IRLNQFSDHTREEFAEKII>WQDLINDYMKGIGQQATHNNANNETQMNSQNHT 

KHKADTKFGVKKFADLSSDEFKNYYLNNKEAIFTDDLPVADYLDDEEINS 

SYK LGLMVF ADMSNDEFKEKYTGS I AGNYT TTE LS YEEVLND 0D VN 

NGEVTFNLAMNKFGDMTLEEFNAVMKGNIPRRSAPVSVFYPKKETGPQ 

NGlOTGFTMFJC^AFGDMTl^EEFRQIVNGYl^QKHKKGRLFQEPLMIiO 

10 30 50 

* 0 * • 0 

IAAS IDWRTKGAVTSVKDQGQCGSCHSFSAAALMESFNF IQNKM.VNFSEQQLVDCVTP E 
IPT AFDKRTRGAVTP VKNQGQCGSCMS F STTGNVEGQHF I SQNKLVS LSEQNLVDCDHE C 
IP E YVWOQKGAVTP VKNQGS COSCHAFSAWT I EG I IKIRTGNLNEYSEQELLDCDRRS 
. ATEVWWTKGAVTPVKDQGQCOSOIAFSTTGSLEG . HFLKTGSLISLAEQQLVDCSRPY 
ZPKTVtnnEKGCVTP^QGQCQSOCySASGC^ 



70 



0 



90 



110 



NGYPSY QCXGGWP ATCLD YAS . KVGITTLDKYPYVAVQKM . CTVTGTNKGFKLKK 

ME YEGEEACDEQCNOJLQPNAYNY 1 1 KNGG I QTE SS YP YTAETGTOCNFNSAH ZGAKISN 

X OCNOaYPKSALQLVA . QYGIHYRNTYP YEGVQRY . CRSREKGP YAAKTD 

GPQ OCUeCMMNDAFD YIKANNGIDTEAAYP YEARDGS . CRFDSNSVAATCSG 

GNQ OCNOOLMDFAFQYIICENGGLDSEESYP YEAKDGS . CKYRAEYAVAKDTG 



130 



150 



WIVIPNTSNDLKSA1NF . . SP VSVLVDAT . . NTOYYSSGIFNG . , CKQTN INLNHAVLAV 
FTMIPKNETVMAGYIVST -GPLAZAADAV. . EWQFYIGGVFDIP .CNPN . .SLDBGZLZV 
GVRQVQPYNEGALLYS I ANQP VS WLE AAGKDFQL YRGG Z FVGP . CGHK . . . VDBAVAAV 
HTNIASG3ETGLQ0AVRDZGPISVTIDAAHSSFQFYSSGVYYEPSCSPS . . YLDHAVLAV 
FVDZPOQEKALMKPVATV.GPISVAHDASHPSLQFYSSGIYYEPNCSSK . .DLOKGVLW 



170 X90 

* * 0 

GYDEKD NWIVKSSIfSAGWGEHGYIRLAP NNTCGILSSKIQVTA 

GY3AKNT ZFRKNMP YWIVKJWWGADWGEQGYIYLRRGK MTCGVSNFVSTSI I 

GYGPN YIL IFJSWG TGWGEKGYIRIKRGTGN3YGVCGLYTSSFYPVKH 

GYGSEGGQD FWLVKKSWATSWGDAGYIKMSRfJR . . . NNKCGIATVAS YPLV 

GYGYEGTDSNKDK.YWLVKHSWGKEWGHDGYIKIAKDR. . . MNHCGLATAASYPIVN 



Fig. 2. Deduced amino acid sequence of the Tetrahymena cys* 
teine protease and four representative cysteine proteases from 
Dictyostelium (10), papaya (11), lobster (12), and rat (13). Amino 
acids in the active site; 0, cysteines in disulfide bridges in papain; 
= , conserved in all five proteins; similar in all five proteins where 
any pair of amino acids within the group have a score of 4 or greater 
on the scale of Feng et ai (14); . , gaps introduced to maximize 
• alignment; boldface type, residues conserved in all known cysteine 
proteases. The numbers above the Tetrahymena sequence indicate 
the number of the amino acid with positive numbers for the putative 
mature protein and negative numbers for the prepro- region of the 
protein. 

which amino acids in the cathepsin B propeptide are required 
for function. The high degree of similarity in the propeptides 
of the three mammalian cathepsin B-like proteases may 
simply reflect short evolutionary distance. Conservation of 
the peptide sequence -57 to -37 between the mammalian 
enzymes and cathepsin B from Schistosoma mansoni (Fig. 3) 
suggests that this region may be important for propeptide 
function, but the Haemonchus contortus propeptide shows 
no striking homology to the others. The sequences of cDNAs 
for additional cathepsin B-like proteases are required to 
determine whether there is a conserved motif in their propep- 
tide regions. 

The subfamily of cysteine proteases that contains the 
ERFNIN motif encompasses all of the enzymes described 
thus far that are similar to mammalian cathepsins H and L. 
The cathepsin B-like enzymes apparently constitute a sepa- 
rate class of cysteine proteases with respect to the structure 
of the mature protein and the propeptide. With regard to the 
overall structure, the proteases containing the interspersed 
ERFNIN motif in the propeptide have longer propeptides and 
are processed to smaller mature enzymes than the cathepsin 
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Table 1. Conserved motif in the p ropeptide of cysteine proteases 
Gene "~ ERFNIN motif N-N 



Consensus 


EX3RX2 (I/V) FX2NX3IX3N 




Cysteine protease 
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thermophila 


_ _ 
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Trypanosoma brucei 
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10 


Dictyostelium cpl 
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14 


Dictyostelium cp2 
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V 




11 


Papaya papain 
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Actinidia actinidin 
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11 


Vigna mungo 




V 
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Barley aleurain 
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V 




10 


Lobster cpl 




V 










14 


Lobster cp2 
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14 


Lobster cp3 
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14 


Mouse cathepsin L 
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w 








14 


Rat cathepsin H 




V 










10 


Rat cathepsin L 




V 


w 








14 


Human cathepsin L 
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w 








14 


Nonprotease 
















Mouse CTLA-2a 




V 


w 








14 


Mouse CTLA-20 




M 


w 








14 



-, Identity of the amino acid with that of the consensus sequence; 
N-N, the number of amino acids between the last Asn of the 
ERFNIN motif and the Asn of the conserved sequence block. 
References: Trypanosoma brucei (17), D. discoideum (10), papaya 
(11), Actinidia (18), V. mungo (19), barley (20), lobster cpl, cp2, and 
cp3 (12), mouse (21), rat cathepsin H (16), rat cathepsin L (13), 
human cathepsin L (22), and mouse CTLA-2a and -/3 (23). 

B-like proteins. In terms of amino acid sequence, there are 
several examples of active site residues that are highly 
conserved within the ERFNIN proteases or within the ca- 
thepsin B-like proteases but different between the two groups 
(Table 2). (i) The seventh amino acid in the ERFNIN 
proteases is an invariant Trp, whereas there is a consensus 
Ala at this position in the cathepsin B-like proteases. («) 
Although the three amino acids that precede the active site 
Gin are structurally similar in the two groups, the specific 
amino acids are different between the two groups. («i) The 
active site His is located within a block of hydrophobic amino 
acids that is interrupted by charged amino acids specific to 
one group or the other. In the ERFNIN group, the His is 
preceded by Asp or Asn; in the cathepsin B-like proteins, the 
third amino acid after the His is Lys or Arg. (zv) The amino 
acid that precedes the active site Asn is a basic Lys in 14 of 
15 ERFNIN proteases and an Arg in the last example. There 
is an invariant nonpolar Ala at this position in the cathepsin 
B-like proteins. 

In the ERFNIN proteases and the cathepsin B-like prote- 
ases, there is a motif Gly-Cys-Asn-Gly-Gly (residues 67-71 in 
Tetrahymena and 70-74 in human cathepsin B, Figs. 2 and 3). 
With the exception of the central Asn residue, this motif is 
invariant in all of the cysteine proteases examined. In papain, 
the Cys in this motif is involved in a disulfide bond and is 
located within a turn. The conservation of this motif between 
the two families of cysteine proteases suggests that it has an 
important structural role. 

DISCUSSION 

Analysis of the deduced amino acid sequence of a cysteine 
protease gene from Tetrahymena suggested that it is trans- 
lated as a preproenzyme. The putative propeptide region of 
the protein contains an amino acid motif in which highly 
conserved amino acids are interspersed with variable ones. 
This motif was present in all the cysteine protease genes in 
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Table 2. Consensus sequences of cysteine proteases 
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Protease 


Consensus sequence 






ERFNIN protease 


6 * » 

DWRTKGAVTP . . . VKNQGQCGSCWAFSX XS SEQNLVDC 

1 -1 -1 - II 1 1 1 1 1 1 1 - 1 1 

DAREQWSNCPTIXIRDQGSCGSCffAFGXiqSAEDLLTC 


160* 

LDHGVLAVGY 


175 ♦ * 

WLVKHSW 


Cathepsin B 


1 - - 
GGHAIRILGW 


1 1 1 
HLVAHSW 



Numbers refer to the first amino acid in the sequence according to the Tetrahymena numbering. *, Active site amino acids; 
boldface type, conserved in all cysteine proteases; |, conserved between the consensus sequences; = , conserved within the 
group and different between the two groups; .... gap introduced to align active site residues. 



the data base, with the exception of the cathepsin B-like 
proteases. 

One possibility is that the ERFNIN motif in the propeptide 
serves to inhibit protease activity and that removal of the 
propeptide converts the protein to the enzymatically active 
form. Although it has not been demonstrated that the propep- 
tide inhibits enzyme activity of cysteine proteases, proregion 
peptides of an aspartyl protease and carboxypeptidase A 
specifically inhibit the respective mature proteases (28, 29). 

If the function of the propeptide is inhibition of enzymatic 
activity, it is not surprising that proteases that lack the 

Human (Hi) MWQLWASLCCLLVLANARSRPSrHPVS 

Moose (Mm) mwwslillscllaltsahdkpsfhpls 

R« (Rn) MWWSLIPLCLLALTSAHDKPSFHPLS 

Sc h is t oso m a maruoni (Sm) MLTSILCIASLITFLEAHISVKNZKFRPLS 



(He) 



-50 



MKYLVUVLCTYLCSQTGADENAAQGIPLBAQ 

-30 -10 



Ha 

Mm 
Rn 



HS 
Mm 
Rn 
Sm 
He 

Ha 

Rn 

Sm 
HC 



Rn 
Sm 



Ha 
Mm 
Rn 
Sm 
He 



Mm 

Rn 



DELVNYVNKR . NTTWQAGHNFYNVDMSYLKRLCGTFLGGPKPPQRVMFTEDLK 

DDLINYINKQ . NTTNQAGRNFYNVDISYLXKLCGTVLGGPECLPGRVAFGEDID 

DDMINYINKQ.NTTWQAGRNTYNVDISYLKKPCGTVLGGPKLPERVGFSEDrN 

DDIISYINEHPNAGWRAEKSNPJHSIJ)DARIQMGARREP^DLRRKRRPTVDHNDW^ 

RLTGEPLVAYLRRSQNLFEVSSAPTPNFEQKIMDIKYKHQKLNLMVKEDPDPEVD. . . 



LP ASFDAHEQWPQCPT I KE I RDQGSCOSCBAFGAVRAI SDR I C I HTNAHVSVE VSAED 
LPETFDABEQWSNCPT IGQI RDQGSCQSCWAFGAVEAXSDRTC IHTNGRVHVE VSAED 
LPESFDARE0WSKCPTIAQZRDQGSC6SCRAFGAVEAMSDRICIHTN. . VNVE VSAED 



IPPSYDPBDVWKNCnTY . I RDQANCGSCKAVSTAAAI SDR I C IASKAEKQVN I SATD 



70 



90 



110 



LLTCCGSMCGDOCNOOYPAEAKNPWTRKGLVSGGL YESHVGCRP YS IPPCEHHVNGSR 
LLTCCGIQCGDGCHQGYPSGAireTiraCKGLVSGCVYDSHIGCLPYTIPPCEHHVHGSR 
LtTCCGIQOmCCNOOYPSGAGlIPimuCGLVSGGVYNSHIGCI^YTIPPCEHHVKGSR 
LLTCC . ESCGLOCEOOILGPAITOYWVKEGIVTASSKENHTGCEP YPFPKCEHHTKGKY 
IMTOCRPQCGDOCEOGNP IEAWKYF I YDGWSGGE YLTKD VCRP YP I HPCGHHGMDTY 



PPCTG2G . DTPKCSKICEPGYSPTYKQDKHYGYKSYSVSKSEKDIMAEIYKNGPVEGA 
PPCTGEG . DTPRCNKSCEAGYSPSYKEDKHFGYTSYSVSNSVKE IMAE I YKNGPVEGA 
PPCTGEG . DTPNCWKMCEAGYSTSYKEDKHYGYTSYSVSDSEKEIMAEI YKNGPVEGA 



YGECRGTAPTPPCKRKCRPGVRKMYRIDKRYGKDAYIVKQSVKAIWSEILRNGPVVAS 



230 



190 210 

FSVYSDF LL YKSGVYQHVTGEMMGGHAI R I LGWGVENGTP YWL^ 
FTVFS D F LT YKS GVYKH EAGDMMG GHA1 R I LVWGVENGVP Y WLAAHSWNLDM GDNGF F 
FTVF S DP LTY KS GVY KH EAGD VMGGHAt R I LGHG I ENGVP Y WL VABSKNVDMG ENGFF 



250 



KILRGQDHCGI ESE WAGI PRTOQYWEK 1 

K I LRGENHCG I ESE I VAGI PRTDQYWGRF 

K I LRGENHCG I ESE I VAGI PRTQ 

RIVRGRDECSIESEVIAGRIN 

RI IRGTNDCGIEGTIAAGIVOTESL 



Fig. 3. Derived amino acid sequence for cathepsin B-like cys- 
teine proteases from human (24), mouse (24), rat (25), Schistosoma 
(26), and Haemonchus (27). *, Amino acids in the active site; =, 
conserved in the propeptide region of the first four proteins; ~, similar 
in the first four proteins where any pair of amino acids within the 
group have a score of 4 or greater on the scale of Feng et al (14); . , 
gaps introduced to maximize alignment; boldface type, residues 
conserved in all known cysteine proteases. The numbers above the 
human sequence indicate the number of the amino acid with positive 
numbers for the putative mature protein and negative numbers for 
the prepro- region. 



ERFNIN motif also show differences in the structure of the 
enzymatic moiety. It is difficult to identify a region of the 
propeptide that might serve a similar function for the cathep- 
sin B-like proteases because the number of available se- 
quences and the phylogenetic distribution of the organisms 
from which they have been obtained are limited. 

A search of the data base was done to determine whether 
the ERFNIN motif was present in proteins other than cys- 
teine proteases. The search uncovered CTLA-2a and CTLA- 
2)8, cDNA clones of mouse RNAs that are specifically 
expressed in T lymphocytes. The deduced CTLA-2 gene 
products are small proteins that have striking homology to 
the cysteine protease propeptides but are not associated with 
a catalytic moiety (23). Their function is unknown but, 
because the propeptides of several proteases serve to inhibit 
protease activity, it was suggested that the CTLA-2 proteins 
may regulate the activity of unidentified cysteine protease(s). 

Analysis of the 20 cysteine protease genes in the EMBL/ 
GenBank data base suggests that they can be divided into two 
distinct classes. Phylogenetic distribution suggests that the 
two types of cysteine proteases were established early in 
evolution. The ERFNIN proteases are found in organisms 
ranging from protozoa to mammals. It is likely that the 
cathepsin B-like enzymes evolved before the divergence of 
the platyhelminthes. 
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GM39890 from the National Institutes of Health and by National 
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Appendix A 

Comparison of the amino acid sequences of the Cysteine proteases from soybean clone 
srr3c.pk003.d1 0:fis (SEQ ID NO:24) and Phaseolus vulgaris set forth NCBI General Identifier No. 251 1691 . 
Amino acids conserved among the two sequences are indicated with a single asterix above the conserved 
residues. Dashes are used by the program to maximize alignment of the sequences. Residues conserved in all 
known cysteine proteases are underlined. Amino acids in the active site and cysteines in disulfide bridges are 
indicated by a dot and by a triangle below the conserved residues, respectively (Karrer et al.(1993) P.N.A.S.90, 
3036-3067). 



** * **** * * * * * +++*+*+ * ********** ** ** 

SEQ ID NO: 24 MANLSLLFFGLLLFSA-AVATVERIDDEDNLLIRQWPDAE-DHHLLNAEHHFSAFKTKF 
Gi : 2511691 MARYTL--CALLLFAAVAAAAGGASTDADDILIRQWPEGEVEDHLLNAEHHFSTFKSKF 

***** ******* ** * ** * ********** ****** ** ******** 
SEQ ID NO: 24 AKTYATQEEHDHRFRIFKNNLLRAKSHQKLDPSAVHGVTRFSDLTPSEFRGQFLGLKPLR 
Gi : 2511691 GKTYATKEEHDHRFGVFKSNMRRARLHAQLDPSAVHGVTKFSDLTPAEFHRKFLGLKPLR 

** ********* ** ****** **** ** ***** ***** ********* ** * 

SEQ ID NO: 24 LPSDAQKAPILPTSDLPTDFDWRDHGAVTGVKNQGSCGWCWSFSAVGALEGAHFLSTGGL 
Gi : 2511691 L P AH AQ KA P I L PTNNL P KDF DWRDKG A VTNVKDQG S CG S CW SFSTTGALEGAHF LATG E L 

• A • 

************* ***** * ********** ***** ** ** ******** * 
SEQ ID NO: 24 VS L SEQQL VDCDHECD P E ERGACDSGCNGGLMTT AF E YTLKAGGLMRE E D Y P YTGRDRG P 
Gi:2511691 VSLSEQQLVDCDHVCDPEEYGSCDSGCNGGLMNNAFEYLIGSGGVQREKDYPYTGRD-GT 

▲ ▲ 

************ * ** ******************* **** **** ************ 

SEQ ID NO: 24 CKFDKSKIAASVANFSWSLDEEQIAANLVKNGPLAVGINAVFMQTYIGGVSCPYICGKH 
Gi : 2511691 CKFDKSKIAASVSNYSVISLDEEQIAANLV™GPLAVAINAVYMQTYVGGVSCPYICGKH 

▲ ▲ • 

*********** ********************** ** ********************* 

SEQ ID NO: 24 LDHGVLLVGYGSGAYAPIRFKEKPYWIIKNSWGESWGEEGYYKICRGRNVCGVDSMVSTV 
Gi : 2511691 LDHGVXLVGYGEGAYAPIRFKEKPYWIIKNSWGENWGGNGYYKICRGRNVCGVDSMVSTV 

• • ▲ 

*** * 

SEQ ID NO: 24 AAIHVSNH 



Gi: 2511691 GAIHASTQ 



